Add fast fail override for perfomance benchmarks by tadiwa-aizen · Pull Request #1769 · awslabs/mountpoint-s3

tadiwa-aizen · 2026-02-18T20:48:30Z

What changed and why?
Added configurable fail-fast mode for benchmark sweeps to enable early termination on failures.

Previously, benchmark sweeps would run all parameter combinations regardless of failures, making it difficult to catch configuration errors early in long-running sweeps. This change
adds a fail_fast parameter to the smart benchmark sweeper that, when enabled, runs benchmarks sequentially and stops immediately upon the first failure. This allows quick
identification of issues during development without wasting resources on remaining benchmarks.

Implementation details:

When fail_fast=true: Benchmarks run one at a time (batch_size=1), checking status after each job
When fail_fast=false (default): All benchmarks run in one batch, maintaining existing behavior
On failure with fail_fast=true: Re-raises the original exception with full context (stack trace, error message, failed configuration)

Does this change impact existing behavior?

No breaking changes. Default behavior (fail_fast=false) maintains existing functionality where all benchmarks run regardless of failures.

Testing

Unit tests in tests/test_execute_batches.py:

test_fail_fast_true_stops_on_first_failure: Verifies that with fail_fast=true, the sweeper stops after the first failed job and doesn't execute remaining benchmarks (validates call count = 2 for 3 jobs when 2nd fails).
test_fail_fast_false_continues_through_failures: Confirms that with fail_fast=false, all benchmarks run in a single batch and failures are captured without stopping execution.

Run tests with:

cd mountpoint-s3/benchmark
uv run pytest tests/test_fail_fast.py -v

Does this change need a changelog entry? Does it require a version change?

No - this only affects internal benchmark tooling, not the Mountpoint binary or published crates.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

Priyankakarumuru1 · 2026-02-20T10:29:03Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

+                # Accessing return_value raises an exception if the job failed (hydra/core/utils.py:251-258)
+                if self.fail_fast:
+                    for r in results:
+                        _ = r.return_value  # Raises on failure, stopping the sweep


What happens if a job fails ? Does the exception message or logs get captured
clearly so users know which benchmark failed and why?

Priyankakarumuru1 · 2026-02-20T10:33:52Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

    _target_: str = "hydra_plugins.smart_sweeper.smart_benchmark_sweeper.SmartBenchmarkSweeper"
    max_batch_size: Optional[int] = None
    params: Optional[Dict[str, str]] = None
+    fail_fast: bool = False


Is this worth documenting somewhere? Users should know when and why they'd want to enable fail_fast=true. Consider this if it feels reasonable.

muddyfish · 2026-02-23T10:54:55Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

-            returns.append(results)
+
+            # Determine batch size: run all at once (fail_fast=False) or one at a time (fail_fast=True)
+            batch_size = 1 if self.fail_fast else len(all_combinations)


Why should fail_fast impact the batch size? Wouldn't it be faster to run in parallel but cancel on failure?

Our SmartBenchmarkSweeper uses Hydra's BasicLauncher by default, which executes jobs sequentially
not in parallel. With fail_fast=True and batch_size=1, the sweeper passes one job at a
time to the launcher, gets the result back immediately, and can check if it failed before launching the next job. If we used batch_size=len(all_combinations), the launcher would run all jobs sequentially and only return after completing every single one, meaning we'd waste time running jobs after a failure instead of stopping early. This fits our requirements.

(For your parralel jobs question..: Hydra does support parallel launchers like
JoblibLauncher, but they don't support canceling in-flight jobs on first failure—they run all submitted jobs to completion. I'm also not sure the effect that parallel might have on the benchmark results in general)

muddyfish · 2026-02-23T10:57:54Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

+                # Accessing return_value raises an exception if the job failed (hydra/core/utils.py:251-258)
+                if self.fail_fast:
+                    for r in results:
+                        _ = r.return_value  # Raises on failure, stopping the sweep


Does accessing return_value raise on failure?

muddyfish · 2026-02-23T10:59:24Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

+                    for r in results:
+                        _ = r.return_value  # Raises on failure, stopping the sweep
+
+                initial_job_idx += len(batch)


Is initial_job_idx the same as i?

muddyfish · 2026-02-23T10:59:52Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

+            # Determine batch size: run all at once (fail_fast=False) or one at a time (fail_fast=True)
+            batch_size = 1 if self.fail_fast else len(all_combinations)
+
+            for i in range(0, len(all_combinations), batch_size):


There should be a test for this.

benchmark/tests/test_fail_fast.py

muddyfish · 2026-02-23T11:03:03Z

benchmark/tests/test_fail_fast.py

+        """
+
+        # Create mock launcher and setup
+        mock_launcher = Mock()


Extract to function build_smart_sweeper?

muddyfish · 2026-02-23T11:03:44Z

benchmark/tests/test_fail_fast.py

+        sweeper_normal._extract_benchmark_types = Mock(return_value=['fio'])
+
+        # Test the batching logic in the sweeper
+        all_combinations = combinations


Why renaming the variable?

muddyfish · 2026-02-23T11:04:23Z

benchmark/tests/test_fail_fast.py

+
+        # Test the batching logic in the sweeper
+        all_combinations = combinations
+        batch_size = 1 if sweeper_normal.fail_fast else len(all_combinations)


This isn't testing anything useful.

muddyfish · 2026-02-23T11:04:45Z

benchmark/tests/test_fail_fast.py

+
+        # Test with fail_fast=True (should batch one at a time)
+        sweeper_fast_fail = SmartBenchmarkSweeper(fail_fast=True)
+        batch_size_fast_fail = 1 if sweeper_fast_fail.fail_fast else len(all_combinations)


Similarly, this isn't testing anything useful

muddyfish · 2026-02-23T11:05:14Z

benchmark/tests/test_fail_fast.py

+        results = [mock_result_success]
+
+        # Accessing return_value on a successful job should NOT raise an exception
+        try:


This is not useful.

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

muddyfish · 2026-03-02T10:35:30Z

benchmark/hydra_plugins/smart_sweeper/smart_benchmark_sweeper.py

-            results = self.launcher.launch(all_combinations, initial_job_idx=initial_job_idx)
+            returns = self._execute_batches(all_combinations, initial_job_idx)
+
+        return returns


Is this meant to return nothing if all_combinations is empty?

muddyfish · 2026-03-02T10:40:14Z

benchmark/tests/test_execute_batches.py

+        ["benchmark_type=fio", "mountpoint.stub_mode=off", "network.maximum_throughput_gbps=100"],
+    ]
+
+    def test_fail_fast_true_stops_on_first_failure(self):


These tests look better, though have you verified that it actually behaves this way when called from the CLI?

Yes, verified with a real CLI run. Configured a benchmark sweep with 36 total combinations and intentionally used an invalid network interface name (ens129 instead of ens33) to trigger a failure.

Results with fail_fast=true:

Job # 0: SUCCESS (fio, single NIC)

Job # 1: SUCCESS (fio, single NIC, direct_io=True)

Job # 2: FAILED (fio, dual NIC with invalid ens129 interface)

Stopped immediately - remaining 33 jobs never executed

The framework correctly caught the CalledProcessError from the mount-s3 command failing with AWS_ERROR_INVALID_ARGUMENT, and stopped execution as expected.

muddyfish · 2026-03-02T10:41:56Z

benchmark/tests/test_execute_batches.py

+        assert mock_launcher.launch.call_count == 1
+        assert len(results) == 1
+        assert len(results[0]) == 3
+        assert results[0][1].status == JobStatus.FAILED  # Verify failure is captured


Missing verification that we have success in the other 2 cases

muddyfish · 2026-03-02T10:42:37Z

benchmark/tests/test_execute_batches.py

+        mock_launcher = Mock()
+        sweeper.launcher = mock_launcher
+
+        # side_effect makes mock return different values on each call: 1st call gets 1st item, 2nd call gets 2nd item, etc.


We don't need excess comments describing what the code does directly.

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

Add fast fail ovveride for benchmarks

1d6baf7

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

tadiwa-aizen requested a deployment to PR integration tests February 18, 2026 20:48 — with GitHub Actions Waiting

Added tests

35a9c04

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

tadiwa-aizen requested a deployment to PR integration tests February 19, 2026 19:13 — with GitHub Actions Waiting

tadiwa-aizen marked this pull request as ready for review February 19, 2026 19:23

Comments

a7bf73c

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

tadiwa-aizen requested a deployment to PR integration tests February 19, 2026 19:49 — with GitHub Actions Waiting

Priyankakarumuru1 reviewed Feb 20, 2026

View reviewed changes

muddyfish requested changes Feb 23, 2026

View reviewed changes

Updated Tests

631601a

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

tadiwa-aizen requested a deployment to PR integration tests March 2, 2026 01:23 — with GitHub Actions Waiting

muddyfish reviewed Mar 2, 2026

View reviewed changes

Updating Comments and test cases

8e679f7

Signed-off-by: Tadiwa Magwenzi <tadiwaom@amazon.com>

tadiwa-aizen requested a deployment to PR integration tests March 3, 2026 18:24 — with GitHub Actions Waiting

tadiwa-aizen requested a review from muddyfish March 3, 2026 18:26

Conversation

tadiwa-aizen commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this change impact existing behavior?

Testing

Does this change need a changelog entry? Does it require a version change?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tadiwa-aizen commented Feb 18, 2026 •

edited

Loading